{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Practice Lab: Neural Networks for Handwritten Digit Recognition, Multiclass \n",
    "\n",
    "In this exercise, you will use a neural network to recognize the hand-written digits 0-9.\n",
    "\n",
    "\n",
    "# Outline\n",
    "- [ 1 - Packages ](#1)\n",
    "- [ 2 - ReLU Activation](#2)\n",
    "- [ 3 - Softmax Function](#3)\n",
    "  - [ Exercise 1](#ex01)\n",
    "- [ 4 - Neural Networks](#4)\n",
    "  - [ 4.1 Problem Statement](#4.1)\n",
    "  - [ 4.2 Dataset](#4.2)\n",
    "  - [ 4.3 Model representation](#4.3)\n",
    "  - [ 4.4 Tensorflow Model Implementation](#4.4)\n",
    "  - [ 4.5 Softmax placement](#4.5)\n",
    "    - [ Exercise 2](#ex02)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a name=\"1\"></a>\n",
    "## 1 - Packages \n",
    "\n",
    "First, let's run the cell below to import all the packages that you will need during this assignment.\n",
    "- [numpy](https://numpy.org/) is the fundamental package for scientific computing with Python.\n",
    "- [matplotlib](http://matplotlib.org) is a popular library to plot graphs in Python.\n",
    "- [tensorflow](https://www.tensorflow.org/) a popular platform for machine learning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense\n",
    "from tensorflow.keras.activations import linear, relu, sigmoid\n",
    "%matplotlib widget\n",
    "import matplotlib.pyplot as plt\n",
    "plt.style.use('./deeplearning.mplstyle')\n",
    "\n",
    "import logging\n",
    "logging.getLogger(\"tensorflow\").setLevel(logging.ERROR)\n",
    "tf.autograph.set_verbosity(0)\n",
    "\n",
    "from public_tests import * \n",
    "\n",
    "from autils import *\n",
    "from lab_utils_softmax import plt_softmax\n",
    "np.set_printoptions(precision=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"2\"></a>\n",
    "## 2 - ReLU Activation\n",
    "This week, a new activation was introduced, the Rectified Linear Unit (ReLU). \n",
    "$$ a = max(0,z) \\quad\\quad\\text {# ReLU function} $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0aef69aa4f6146dfb4e92e7786bf102b",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt_act_trio()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img align=\"right\" src=\"./images/C2_W2_ReLu.png\"     style=\" width:380px; padding: 10px 20px; \" >\n",
    "The example from the lecture on the right shows an application of the ReLU. In this example, the derived \"awareness\" feature is not binary but has a continuous range of values. The sigmoid is best for on/off or binary situations. The ReLU provides a continuous linear relationship. Additionally it has an 'off' range where the output is zero.     \n",
    "The \"off\" feature makes the ReLU a Non-Linear activation. Why is this needed? This enables multiple units to contribute to to the resulting function without interfering. This is examined more in the supporting optional lab. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a name=\"3\"></a>\n",
    "## 3 - Softmax Function\n",
    "A multiclass neural network generates N outputs. One output is selected as the predicted answer. In the output layer, a vector $\\mathbf{z}$ is generated by a linear function which is fed into a softmax function. The softmax function converts $\\mathbf{z}$  into a probability distribution as described below. After applying softmax, each output will be between 0 and 1 and the outputs will sum to 1. They can be interpreted as probabilities. The larger inputs to the softmax will correspond to larger output probabilities.\n",
    "<center>  <img  src=\"./images/C2_W2_NNSoftmax.PNG\" width=\"600\" />  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The softmax function can be written:\n",
    "$$a_j = \\frac{e^{z_j}}{ \\sum_{k=0}^{N-1}{e^{z_k} }} \\tag{1}$$\n",
    "\n",
    "Where $z = \\mathbf{w} \\cdot \\mathbf{x} + b$ and N is the number of feature/categories in the output layer.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"ex01\"></a>\n",
    "### Exercise 1\n",
    "Let's create a NumPy implementation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# UNQ_C1\n",
    "# GRADED CELL: my_softmax\n",
    "\n",
    "def my_softmax(z):  \n",
    "    \"\"\" Softmax converts a vector of values to a probability distribution.\n",
    "    Args:\n",
    "      z (ndarray (N,))  : input data, N features\n",
    "    Returns:\n",
    "      a (ndarray (N,))  : softmax of z\n",
    "    \"\"\"    \n",
    "    ### START CODE HERE ### \n",
    "    ez = np.exp(z)\n",
    "    a = ez/np.sum(ez)\n",
    "    ### END CODE HERE ### \n",
    "    return a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "my_softmax(z):         [0.03 0.09 0.24 0.64]\n",
      "tensorflow softmax(z): [0.03 0.09 0.24 0.64]\n",
      "\u001b[92m All tests passed.\n"
     ]
    }
   ],
   "source": [
    "z = np.array([1., 2., 3., 4.])\n",
    "a = my_softmax(z)\n",
    "atf = tf.nn.softmax(z)\n",
    "print(f\"my_softmax(z):         {a}\")\n",
    "print(f\"tensorflow softmax(z): {atf}\")\n",
    "\n",
    "# BEGIN UNIT TEST  \n",
    "test_my_softmax(my_softmax)\n",
    "# END UNIT TEST  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "  <summary><font size=\"3\" color=\"darkgreen\"><b>Click for hints</b></font></summary>\n",
    "    One implementation uses for loop to first build the denominator and then a second loop to calculate each output.\n",
    "    \n",
    "```python\n",
    "def my_softmax(z):  \n",
    "    N = len(z)\n",
    "    a =                     # initialize a to zeros \n",
    "    ez_sum =                # initialize sum to zero\n",
    "    for k in range(N):      # loop over number of outputs             \n",
    "        ez_sum +=           # sum exp(z[k]) to build the shared denominator      \n",
    "    for j in range(N):      # loop over number of outputs again                \n",
    "        a[j] =              # divide each the exp of each output by the denominator   \n",
    "    return(a)\n",
    "```\n",
    "<details>\n",
    "  <summary><font size=\"3\" color=\"darkgreen\"><b>Click for code</b></font></summary>\n",
    "   \n",
    "```python\n",
    "def my_softmax(z):  \n",
    "    N = len(z)\n",
    "    a = np.zeros(N)\n",
    "    ez_sum = 0\n",
    "    for k in range(N):                \n",
    "        ez_sum += np.exp(z[k])       \n",
    "    for j in range(N):                \n",
    "        a[j] = np.exp(z[j])/ez_sum   \n",
    "    return(a)\n",
    "\n",
    "Or, a vector implementation:\n",
    "\n",
    "def my_softmax(z):  \n",
    "    ez = np.exp(z)              \n",
    "    a = ez/np.sum(ez)           \n",
    "    return(a)\n",
    "\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below, vary the values of the `z` inputs. Note in particular how the exponential in the numerator magnifies small differences in the values. Note as well that the output values sum to one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2b5ea2c37bba41209c8e2157f63c2738",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.close(\"all\")\n",
    "plt_softmax(my_softmax)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "<a name=\"4\"></a>\n",
    "## 4 - Neural Networks\n",
    "\n",
    "In last weeks assignment, you implemented a neural network to do binary classification. This week you will extend that to multiclass classification. This will utilize the softmax activation.\n",
    "\n",
    "\n",
    "<a name=\"4.1\"></a>\n",
    "### 4.1 Problem Statement\n",
    "\n",
    "In this exercise, you will use a neural network to recognize ten handwritten digits, 0-9. This is a multiclass classification task where one of n choices is selected. Automated handwritten digit recognition is widely used today - from recognizing zip codes (postal codes) on mail envelopes to recognizing amounts written on bank checks. \n",
    "\n",
    "\n",
    "<a name=\"4.2\"></a>\n",
    "### 4.2 Dataset\n",
    "\n",
    "You will start by loading the dataset for this task. \n",
    "- The `load_data()` function shown below loads the data into variables `X` and `y`\n",
    "\n",
    "\n",
    "- The data set contains 5000 training examples of handwritten digits $^1$.  \n",
    "\n",
    "    - Each training example is a 20-pixel x 20-pixel grayscale image of the digit. \n",
    "        - Each pixel is represented by a floating-point number indicating the grayscale intensity at that location. \n",
    "        - The 20 by 20 grid of pixels is “unrolled” into a 400-dimensional vector. \n",
    "        - Each training examples becomes a single row in our data matrix `X`. \n",
    "        - This gives us a 5000 x 400 matrix `X` where every row is a training example of a handwritten digit image.\n",
    "\n",
    "$$X = \n",
    "\\left(\\begin{array}{cc} \n",
    "--- (x^{(1)}) --- \\\\\n",
    "--- (x^{(2)}) --- \\\\\n",
    "\\vdots \\\\ \n",
    "--- (x^{(m)}) --- \n",
    "\\end{array}\\right)$$ \n",
    "\n",
    "- The second part of the training set is a 5000 x 1 dimensional vector `y` that contains labels for the training set\n",
    "    - `y = 0` if the image is of the digit `0`, `y = 4` if the image is of the digit `4` and so on.\n",
    "\n",
    "$^1$<sub> This is a subset of the MNIST handwritten digit dataset (http://yann.lecun.com/exdb/mnist/)</sub>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load dataset\n",
    "X, y = load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4.2.1 View the variables\n",
    "Let's get more familiar with your dataset.  \n",
    "- A good place to start is to print out each variable and see what it contains.\n",
    "\n",
    "The code below prints the first element in the variables `X` and `y`.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print ('The first element of X is: ', X[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The first element of y is:  0\n",
      "The last element of y is:  9\n"
     ]
    }
   ],
   "source": [
    "print ('The first element of y is: ', y[0,0])\n",
    "print ('The last element of y is: ', y[-1,0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4.2.2 Check the dimensions of your variables\n",
    "\n",
    "Another way to get familiar with your data is to view its dimensions. Please print the shape of `X` and `y` and see how many training examples you have in your dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The shape of X is: (5000, 400)\n",
      "The shape of y is: (5000, 1)\n"
     ]
    }
   ],
   "source": [
    "print ('The shape of X is: ' + str(X.shape))\n",
    "print ('The shape of y is: ' + str(y.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4.2.3 Visualizing the Data\n",
    "\n",
    "You will begin by visualizing a subset of the training set. \n",
    "- In the cell below, the code randomly selects 64 rows from `X`, maps each row back to a 20 pixel by 20 pixel grayscale image and displays the images together. \n",
    "- The label for each image is displayed above the image "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f66746b327ff4eb4a92df796eff793bb",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import warnings\n",
    "warnings.simplefilter(action='ignore', category=FutureWarning)\n",
    "# You do not need to modify anything in this cell\n",
    "\n",
    "m, n = X.shape\n",
    "\n",
    "fig, axes = plt.subplots(8,8, figsize=(5,5))\n",
    "fig.tight_layout(pad=0.13,rect=[0, 0.03, 1, 0.91]) #[left, bottom, right, top]\n",
    "\n",
    "#fig.tight_layout(pad=0.5)\n",
    "widgvis(fig)\n",
    "for i,ax in enumerate(axes.flat):\n",
    "    # Select random indices\n",
    "    random_index = np.random.randint(m)\n",
    "    \n",
    "    # Select rows corresponding to the random indices and\n",
    "    # reshape the image\n",
    "    X_random_reshaped = X[random_index].reshape((20,20)).T\n",
    "    \n",
    "    # Display the image\n",
    "    ax.imshow(X_random_reshaped, cmap='gray')\n",
    "    \n",
    "    # Display the label above the image\n",
    "    ax.set_title(y[random_index,0])\n",
    "    ax.set_axis_off()\n",
    "    fig.suptitle(\"Label, image\", fontsize=14)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"4.3\"></a>\n",
    "### 4.3 Model representation\n",
    "\n",
    "The neural network you will use in this assignment is shown in the figure below. \n",
    "- This has two dense layers with ReLU activations followed by an output layer with a linear activation. \n",
    "    - Recall that our inputs are pixel values of digit images.\n",
    "    - Since the images are of size $20\\times20$, this gives us $400$ inputs  \n",
    "    \n",
    "<img src=\"images/C2_W2_Assigment_NN.png\" width=\"600\" height=\"450\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- The parameters have dimensions that are sized for a neural network with $25$ units in layer 1, $15$ units in layer 2 and $10$ output units in layer 3, one for each digit.\n",
    "\n",
    "    - Recall that the dimensions of these parameters is determined as follows:\n",
    "        - If network has $s_{in}$ units in a layer and $s_{out}$ units in the next layer, then \n",
    "            - $W$ will be of dimension $s_{in} \\times s_{out}$.\n",
    "            - $b$ will be a vector with $s_{out}$ elements\n",
    "  \n",
    "    - Therefore, the shapes of `W`, and `b`,  are \n",
    "        - layer1: The shape of `W1` is (400, 25) and the shape of `b1` is (25,)\n",
    "        - layer2: The shape of `W2` is (25, 15) and the shape of `b2` is: (15,)\n",
    "        - layer3: The shape of `W3` is (15, 10) and the shape of `b3` is: (10,)\n",
    ">**Note:** The bias vector `b` could be represented as a 1-D (n,) or 2-D (n,1) array. Tensorflow utilizes a 1-D representation and this lab will maintain that convention: \n",
    "               "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"4.4\"></a>\n",
    "### 4.4 Tensorflow Model Implementation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Tensorflow models are built layer by layer. A layer's input dimensions ($s_{in}$ above) are calculated for you. You specify a layer's *output dimensions* and this determines the next layer's input dimension. The input dimension of the first layer is derived from the size of the input data specified in the `model.fit` statement below. \n",
    ">**Note:** It is also possible to add an input layer that specifies the input dimension of the first layer. For example:  \n",
    "`tf.keras.Input(shape=(400,)),    #specify input shape`  \n",
    "We will include that here to illuminate some model sizing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"4.5\"></a>\n",
    "### 4.5 Softmax placement\n",
    "As described in the lecture and the optional softmax lab, numerical stability is improved if the softmax is grouped with the loss function rather than the output layer during training. This has implications when *building* the model and *using* the model.  \n",
    "Building:  \n",
    "* The final Dense layer should use a 'linear' activation. This is effectively no activation. \n",
    "* The `model.compile` statement will indicate this by including `from_logits=True`.\n",
    "`loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) `  \n",
    "* This does not impact the form of the target. In the case of SparseCategorialCrossentropy, the target is the expected digit, 0-9.\n",
    "\n",
    "Using the model:\n",
    "* The outputs are not probabilities. If output probabilities are desired, apply a softmax function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"ex02\"></a>\n",
    "### Exercise 2\n",
    "\n",
    "Below, using Keras [Sequential model](https://keras.io/guides/sequential_model/) and [Dense Layer](https://keras.io/api/layers/core_layers/dense/) with a ReLU activation to construct the three layer network described above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# UNQ_C2\n",
    "# GRADED CELL: Sequential model\n",
    "tf.random.set_seed(1234) # for consistent results\n",
    "model = Sequential(\n",
    "    [               \n",
    "        ### START CODE HERE ### \n",
    "        tf.keras.layers.InputLayer((400,)),\n",
    "        tf.keras.layers.Dense(25, activation=\"relu\", name=\"L1\"),\n",
    "        tf.keras.layers.Dense(15, activation=\"relu\", name=\"L2\"),\n",
    "        tf.keras.layers.Dense(10, activation=\"linear\", name=\"L3\")\n",
    "        ### END CODE HERE ### \n",
    "    ], name = \"my_model\" \n",
    ")\n",
    "model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"my_model\"\n",
      "_________________________________________________________________\n",
      " Layer (type)                Output Shape              Param #   \n",
      "=================================================================\n",
      " L1 (Dense)                  (None, 25)                10025     \n",
      "                                                                 \n",
      " L2 (Dense)                  (None, 15)                390       \n",
      "                                                                 \n",
      " L3 (Dense)                  (None, 10)                160       \n",
      "                                                                 \n",
      "=================================================================\n",
      "Total params: 10,575\n",
      "Trainable params: 10,575\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "  <summary><font size=\"3\" color=\"darkgreen\"><b>Expected Output (Click to expand)</b></font></summary>\n",
    "The `model.summary()` function displays a useful summary of the model. Note, the names of the layers may vary as they are auto-generated unless the name is specified.    \n",
    "    \n",
    "```\n",
    "Model: \"my_model\"\n",
    "_________________________________________________________________\n",
    "Layer (type)                 Output Shape              Param #   \n",
    "=================================================================\n",
    "L1 (Dense)                   (None, 25)                10025     \n",
    "_________________________________________________________________\n",
    "L2 (Dense)                   (None, 15)                390       \n",
    "_________________________________________________________________\n",
    "L3 (Dense)                   (None, 10)                160       \n",
    "=================================================================\n",
    "Total params: 10,575\n",
    "Trainable params: 10,575\n",
    "Non-trainable params: 0\n",
    "_________________________________________________________________\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "  <summary><font size=\"3\" color=\"darkgreen\"><b>Click for hints</b></font></summary>\n",
    "    \n",
    "```python\n",
    "tf.random.set_seed(1234)\n",
    "model = Sequential(\n",
    "    [               \n",
    "        ### START CODE HERE ### \n",
    "        tf.keras.Input(shape=(400,)),     # @REPLACE \n",
    "        Dense(25, activation='relu', name = \"L1\"), # @REPLACE \n",
    "        Dense(15, activation='relu',  name = \"L2\"), # @REPLACE  \n",
    "        Dense(10, activation='linear', name = \"L3\"),  # @REPLACE \n",
    "        ### END CODE HERE ### \n",
    "    ], name = \"my_model\" \n",
    ")\n",
    "``` "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[92mAll tests passed!\n"
     ]
    }
   ],
   "source": [
    "# BEGIN UNIT TEST     \n",
    "test_model(model, 10, 400)\n",
    "# END UNIT TEST     "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The parameter counts shown in the summary correspond to the number of elements in the weight and bias arrays as shown below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's further examine the weights to verify that tensorflow produced the same dimensions as we calculated above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "[layer1, layer2, layer3] = model.layers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "W1 shape = (400, 25), b1 shape = (25,)\n",
      "W2 shape = (25, 15), b2 shape = (15,)\n",
      "W3 shape = (15, 10), b3 shape = (10,)\n"
     ]
    }
   ],
   "source": [
    "#### Examine Weights shapes\n",
    "W1,b1 = layer1.get_weights()\n",
    "W2,b2 = layer2.get_weights()\n",
    "W3,b3 = layer3.get_weights()\n",
    "print(f\"W1 shape = {W1.shape}, b1 shape = {b1.shape}\")\n",
    "print(f\"W2 shape = {W2.shape}, b2 shape = {b2.shape}\")\n",
    "print(f\"W3 shape = {W3.shape}, b3 shape = {b3.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Expected Output**\n",
    "```\n",
    "W1 shape = (400, 25), b1 shape = (25,)  \n",
    "W2 shape = (25, 15), b2 shape = (15,)  \n",
    "W3 shape = (15, 10), b3 shape = (10,)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following code:\n",
    "* defines a loss function, `SparseCategoricalCrossentropy` and indicates the softmax should be included with the  loss calculation by adding `from_logits=True`)\n",
    "* defines an optimizer. A popular choice is Adaptive Moment (Adam) which was described in lecture."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/40\n",
      "157/157 [==============================] - 1s 2ms/step - loss: 1.7094\n",
      "Epoch 2/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.7480\n",
      "Epoch 3/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.4428\n",
      "Epoch 4/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.3463\n",
      "Epoch 5/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.2977\n",
      "Epoch 6/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.2630\n",
      "Epoch 7/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.2361\n",
      "Epoch 8/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.2131\n",
      "Epoch 9/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.2004\n",
      "Epoch 10/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.1805\n",
      "Epoch 11/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.1692\n",
      "Epoch 12/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.1580\n",
      "Epoch 13/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.1507\n",
      "Epoch 14/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.1396\n",
      "Epoch 15/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.1289\n",
      "Epoch 16/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.1255\n",
      "Epoch 17/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.1154\n",
      "Epoch 18/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.1102\n",
      "Epoch 19/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.1016\n",
      "Epoch 20/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0970\n",
      "Epoch 21/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0926\n",
      "Epoch 22/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0891\n",
      "Epoch 23/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0828\n",
      "Epoch 24/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0785\n",
      "Epoch 25/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0755\n",
      "Epoch 26/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0713\n",
      "Epoch 27/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0701\n",
      "Epoch 28/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0617\n",
      "Epoch 29/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0578\n",
      "Epoch 30/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0550\n",
      "Epoch 31/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0511\n",
      "Epoch 32/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0499\n",
      "Epoch 33/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0462\n",
      "Epoch 34/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0437\n",
      "Epoch 35/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0422\n",
      "Epoch 36/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0396\n",
      "Epoch 37/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0366\n",
      "Epoch 38/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0344\n",
      "Epoch 39/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0312\n",
      "Epoch 40/40\n",
      "157/157 [==============================] - 0s 2ms/step - loss: 0.0294\n"
     ]
    }
   ],
   "source": [
    "model.compile(\n",
    "    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
    "    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),\n",
    ")\n",
    "\n",
    "history = model.fit(\n",
    "    X,y,\n",
    "    epochs=40\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Epochs and batches\n",
    "In the `compile` statement above, the number of `epochs` was set to 100. This specifies that the entire data set should be applied during training 100 times.  During training, you see output describing the progress of training that looks like this:\n",
    "```\n",
    "Epoch 1/100\n",
    "157/157 [==============================] - 0s 1ms/step - loss: 2.2770\n",
    "```\n",
    "The first line, `Epoch 1/100`, describes which epoch the model is currently running. For efficiency, the training data set is broken into 'batches'. The default size of a batch in Tensorflow is 32. There are 5000 examples in our data set or roughly 157 batches. The notation on the 2nd line `157/157 [====` is describing which batch has been executed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Loss  (cost)\n",
    "In course 1, we learned to track the progress of gradient descent by monitoring the cost. Ideally, the cost will decrease as the number of iterations of the algorithm increases. Tensorflow refers to the cost as `loss`. Above, you saw the loss displayed each epoch as `model.fit` was executing. The [.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model) method returns a variety of metrics including the loss. This is captured in the `history` variable above. This can be used to examine the loss in a plot as shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "85765547adaa44e197f57464c49a0ab9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plot_loss_tf(history)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Prediction \n",
    "To make a prediction, use Keras `predict`. Below, X[1015] contains an image of a two."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "6d0cfe835bd14b1fb3b8a9fa09b3167e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " predicting a Two: \n",
      "[[ -7.99  -2.23   0.77  -2.41 -11.66 -11.15  -9.53  -3.36  -4.42  -7.17]]\n",
      " Largest Prediction index: 2\n"
     ]
    }
   ],
   "source": [
    "image_of_two = X[1015]\n",
    "display_digit(image_of_two)\n",
    "\n",
    "prediction = model.predict(image_of_two.reshape(1,400))  # prediction\n",
    "\n",
    "print(f\" predicting a Two: \\n{prediction}\")\n",
    "print(f\" Largest Prediction index: {np.argmax(prediction)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The largest output is prediction[2], indicating the predicted digit is a '2'. If the problem only requires a selection, that is sufficient. Use NumPy [argmax](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html) to select it. If the problem requires a probability, a softmax is required:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " predicting a Two. Probability vector: \n",
      "[[1.42e-04 4.49e-02 8.98e-01 3.76e-02 3.61e-06 5.97e-06 3.03e-05 1.44e-02\n",
      "  5.03e-03 3.22e-04]]\n",
      "Total of predictions: 1.000\n"
     ]
    }
   ],
   "source": [
    "prediction_p = tf.nn.softmax(prediction)\n",
    "\n",
    "print(f\" predicting a Two. Probability vector: \\n{prediction_p}\")\n",
    "print(f\"Total of predictions: {np.sum(prediction_p):0.3f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To return an integer representing the predicted target, you want the index of the largest probability. This is accomplished with the Numpy [argmax](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html) function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "np.argmax(prediction_p): 2\n"
     ]
    }
   ],
   "source": [
    "yhat = np.argmax(prediction_p)\n",
    "\n",
    "print(f\"np.argmax(prediction_p): {yhat}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's compare the predictions vs the labels for a random sample of 64 digits. This takes a moment to run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4190acc8b7cf4617856f2ce6f839880f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import warnings\n",
    "warnings.simplefilter(action='ignore', category=FutureWarning)\n",
    "# You do not need to modify anything in this cell\n",
    "\n",
    "m, n = X.shape\n",
    "\n",
    "fig, axes = plt.subplots(8,8, figsize=(5,5))\n",
    "fig.tight_layout(pad=0.13,rect=[0, 0.03, 1, 0.91]) #[left, bottom, right, top]\n",
    "widgvis(fig)\n",
    "for i,ax in enumerate(axes.flat):\n",
    "    # Select random indices\n",
    "    random_index = np.random.randint(m)\n",
    "    \n",
    "    # Select rows corresponding to the random indices and\n",
    "    # reshape the image\n",
    "    X_random_reshaped = X[random_index].reshape((20,20)).T\n",
    "    \n",
    "    # Display the image\n",
    "    ax.imshow(X_random_reshaped, cmap='gray')\n",
    "    \n",
    "    # Predict using the Neural Network\n",
    "    prediction = model.predict(X[random_index].reshape(1,400))\n",
    "    prediction_p = tf.nn.softmax(prediction)\n",
    "    yhat = np.argmax(prediction_p)\n",
    "    \n",
    "    # Display the label above the image\n",
    "    ax.set_title(f\"{y[random_index,0]},{yhat}\",fontsize=10)\n",
    "    ax.set_axis_off()\n",
    "fig.suptitle(\"Label, yhat\", fontsize=14)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at some of the errors. \n",
    ">Note: increasing the number of training epochs can eliminate the errors on this data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ec4df6ce60e9470a9c5b1fe8bef3df02",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "15 errors out of 5000 images\n"
     ]
    }
   ],
   "source": [
    "print( f\"{display_errors(model,X,y)} errors out of {len(X)} images\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Congratulations!\n",
    "You have successfully built and utilized a neural network to do multiclass classification."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "dl_toc_settings": {
   "rndtag": "89367"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
